Simplification of SQL queries using generalized inference propagation and generalized transitive closure

ABSTRACT

A method and apparatus for query simplification by applying generalization inference propagation and transitive closure in SQL queries having intersection operations combined with inner and outer join operations further combined with selection and/or projection operations. First a strong set of predicates is calculated, typically a selection or join predicate which is null-intolerant. Second the strong set of attributes is used to select and replace a full, left, or right outer join operation found as the operand of an intersection operation in the query. A less expensive outer or inner join replaces the selected join and the process is repeated until no further simplification if possible. Inferences are propagated from one operand to the other of an intersection operation. Selection operators are applied through transitive closure to reduce the size of intermediate results. The transformations are applied to the query to produces a set of operations that perform more efficiently than that of the original query while providing the same results.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates in general to database management systemsperformed by computers, and in particular, to a method and apparatus forquery simplification using generalized inference propagation andgeneralized transitive closure in SQL queries having selection,projection, join, outer join, and intersection operations.

2. Description of Related Art

Computer systems incorporating Relational DataBase Management System(RDBMS) software using a Structured Query Language (SQL) interface arewell known in the art. The SQL interface has evolved into a standardlanguage for RDBMS software and has been adopted as such by both theAmerican Nationals Standard Organization (ANSI) and the InternationalStandards Organization (ISO).

In RDBMS software, all data is externally structured into tables. TheSQL interface allows users to formulate relational operations on thetables either interactively, in batch files, or embedded in hostlanguages such as C, COBOL, etc. Operators are provided in SQL thatallow the user to manipulate the data, wherein each operator operates oneither one or two tables and produces a new table as a result. The powerof SQL lies on its ability to link information from multiple tables orviews together to perform complex sets of procedures with a singlestatement.

The current state-of-the-art in SQL query optimization providestechniques for simplifying queries based on the nature ofnull-intolerant predicates, as reflected in the following patentapplication and publication, both of which are incorporated by referenceherein:

1. U.S. patent application Ser. No. 08/326,461, filed Oct. 20, 1994, byG. Bhargava, P. Goel, and B. Iyer, entitled "METHOD AND APPARATUS FORREORDERING COMPLEX SQL QUERIES CONTAINING INNER AND OUTER JOINOPERATIONS," (hereinafter referred to as BHAR94!); and

2. Galindo-Legaria, C. A., "Algebraic optimization of outer joinqueries," Ph.D. dissertation, Dept. of Applied Science, HarvardUniversity, Cambridge, 1992, (hereinafter referred to as GALI92b!).

However, these prior art techniques do not simplify queries in thepresence of intersection operators. Further, these prior art techniquesdo not include any scheme for inference propagation.

One prior art simplification technique uses selection predicates for"removing arrows". As explained in the following publication:Galindo-Legaria, C., and Rosenthal, A., "How to extend a conventionaloptimizer to handle one- and two-sided outer join," Proceedings of DataEngineering, pp. 402-409, 1992, (hereinafter referred to as GALI92a!),selections may cause the replacement of full and one-sided outer joinswith one-sided outer joins and joins, respectively. Thesesimplifications are based on the recursive application of the followingidentities for relations X₁ =(R₁, V₁, E₁) and X₂ =(R₂, V₂, E₂):

    σ.sub.p (X.sub.1 →X.sub.2)=σ.sub.p (X.sub.1 X.sub.2)

wherein R₁ and R₂ are non-empty sets of real attributes, V₁ and V₂ arenon-empty sets of virtual attributes, E₁ and E₂ are extensions of therelations, σ_(p) is a selection operator with predicate p, → is a leftouter join operator, is a join operator, sch(p).OR right.R₂, and p isnull-intolerant in R₂ ; and:

    σ.sub.p (X.sub.1 ⃡X.sub.2)=σ.sub.p (X.sub.1 →X.sub.2)

wherein R₁ and R₂ are non-empty sets of real attributes, V₁ and V₂ arenon-empty sets of virtual attributes, E₁ and E₂ are extensions of therelations, σ_(p) is a selection operator with predicate p, ⃡ is a fullouter join operator, → is a left outer join operator, sch(p).ORright.R₁, and p is null-intolerant in R₁.

Another prior art simplification technique for "removing arrows" usespredicates for join and outer join operators. The following identitiesare used recursively to replace full outer join and one-sided outer joinoperations with one-sided outer join and join operations, respectively.These identities are a formalization and extension of the ideaspresented in GALI92b!. Note, however, that GALI92b! does not considerouter join predicates that reference more than two relations.

Let X_(i) =, where 1≦i≦3, be relational expressions and let p be anull-intolerant predicate. Then, ##EQU1## if sch(p) ∩ R₃ ≠.o slashed.;##EQU2## if sch(p) ∩ R₂ ≠.o slashed.; ##EQU3## if sch(p) ∩ R₂ ≠.oslashed. and sch(p) ∩ R₃ ≠.o slashed.; ##EQU4## if sch(p) ∩ R₃ ≠.oslashed.; ##EQU5## if sch(p) ∩ R₂ ≠.o slashed.; and ##EQU6## if sch(p) ∩R₂ ≠.o slashed. and sch(p) ∩ R₃ ≠.o slashed..

After applying the above identities, the original query is transformedinto a form where all "redundant" (full) outer join operations have beenremoved.

The following publications present techniques that exploit certainproperties of intersection distinct operators to transform them tojoins:

1. Pirahesh, H., Hellerstein, J. M. and Hasan, W., "Extensible/rulebased query rewrite optimization in STARBURST," SIGMOD, pp. 39-48, SanDiego, Calif., June 1992, (hereinafter referred to as PIRA92!); and

2. Paulley, G. N. and Larson, P.-A., "Exploiting uniqueness in queryoptimization," CASCON, pp. 804-822, 822, Vol. II, October 1993,(hereinafter referred to as PAUL93!).

However, the techniques discussed by these publications are not the sameas disclosed by the present invention, although they are stillapplicable after the simplifications proposed by the present inventionhave been applied.

Finally, the prior art on predicate transitive closure consists ofidentifying equivalence classes of attributes that are equated thoughequi-join predicates and then applying the given selection conditions onany of them to the whole class, as described in the followingpublication Alon Levy, Inderpal Mumick, Yehoshua Sagiv, "QueryOptimization by Predicate Move-Around," Proceedings of the 20th VLDBConference, Santiago, Chile, September, 1994, (hereinafter referred toas " LEVY94!"). This idea can be formally stated in the form of thefollowing identities. The first two of these identities constitutepush-up and push-down rules for σ. Let X=(R_(X), V_(X), E_(X)) andY=(R_(Y), V_(Y), E_(Y)) be relational expressions, p be a selectionpredicate, and J_(XY) be a null-intolerant equi-join predicate. Then,##EQU7## where sch(p).OR right.R_(X) ; ##EQU8## where sch(p).ORright.R_(Y) ; ##EQU9## where sch(p).OR right.R_(X), sch(p').ORright.R_(Y), and p' is obtained from p by replacing every clause of theform X_(i) θc by Y_(j) θc, where θ is a relational operator, X_(i)=Y_(j) is a clause in J_(XY) ; and ##EQU10## where sch(p).ORright.R_(Y), sch(p).OR right.R_(X), and p' is obtained from p byreplacing every clause of the form Y_(j) θc by X_(i) θc, where X_(i)=Y_(j) is a clause in J_(XY). However, these identities work forjoin-only queries.

Thus, there is a need in the art for query simplification methods thatwork for queries involving (full) outer joins operators, intersectionoperators, as well as join operators.

SUMMARY OF THE INVENTION

To overcome the limitations in the prior art described above, and toovercome other limitations that will become apparent upon reading andunderstanding the present specification, the present invention disclosesa method and apparatus for query simplification using generalizedinference propagation and generalized transitive closure in SQL querieshaving selection, projection, join, outer join, and intersectionoperations.

An object of the present invention is to propagate inferences about oneoperand of an intersection operation to the other operand, and then touse this information to simplify the query by transforming outer joinoperations to less expensive joins. Another object of the presentinvention is to apply selection operators through transitive closure tomake intermediate results of SQL queries smaller. Yet another object ofthe present invention is to apply "null-intolerance" properties of joinpredicates in the presence of intersection operators to further simplifySQL queries.

BRIEF DESCRIPTION OF THE DRAWINGS

Referring now to the drawings in which like reference numbers representcorresponding parts throughout:

FIG. 1 illustrates the computer hardware environment of the presentinvention;

FIG. 2 is a flowchart illustrating the steps necessary for theinterpretation and execution of SQL statements in an interactiveenvironment according to the present invention;

FIG. 3 is a flowchart illustrating the steps necessary for theinterpretation and execution of SQL statements embedded in source codeaccording to the present invention;

FIG. 4 is a flowchart illustrating the method of optimizing SQL queriesof the present invention; and

FIGS. 5A-5H together are a flowchart describing the generalizedinference propagation method that determines which of the "component"attribute pairs of the "p* intersection-predicate" become join-reducedand then uses the simplification identities to simplify the query.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

In the following description of the preferred embodiment, reference ismade to the accompanying drawings which form a part hereof, and in whichis shown by way of illustration a specific embodiment in which theinvention may be practiced. It is to be understood that otherembodiments may be utilized and structural and functional changes may bemade without departing from the scope of the present invention.

DEFINITIONS

Following are definitions for a number of terms used in SQL queries.These definitions are required for an understanding of later portions ofthe present specification. A complete algebra for relations withduplicates is included in BHAR94!, which is incorporated by reference.Only the more pertinent definitions are included here, suitably modifiedto make the presentation more readable.

Tuple

A tuple t is a mapping from a finite set of attributes, R∪V, to a set ofatomic (possibly null) values, where R is a non-empty set of realattributes and V is a non-empty set of virtual attributes, R∩V=.oslashed., and the mapping t maps at least one virtual attribute vεV to anon-null value. For an attribute set X, t X! represents the valuesassociated with attributes X under the mapping t, where X.OR right.R∪Vand X≠.o slashed..

The motivation behind the distinction between real and virtualattributes is to emphasize the difference between the real attributesavailable for manipulation (to the user of the RDBMS) and the virtualattributes used (by the RDBMS) for bookkeeping only. The set of realattributes of a tuple is the same as the schema of the tuple in thetraditional relational algebra. These attributes are accessible to usersand can be referenced externally, e.g., in user queries, etc. On theother hand, virtual attributes are (at times) used to provide uniqueconceptional tuple-ids to tuples, and are not accessible to users andcannot be referenced externally.

Relation

A relation r is a triple (R, V, E) where R is a non-empty set of realattributes, V is a non-empty set of virtual attributes, and E, theextension of relation r, is a set of tuples such that:

    (∀t.sub.1 εE)(∀t.sub.2 εE)(t.sub.1 ≠t.sub.2 t.sub.1  V!≠t.sub.2  V!)

In this definition, R∪V is called the schema of relation r.

Predicate

A predicate p over a set of real attributes sch(p), called the schema ofp, is a total mapping from tuples to the Boolean values {TRUE, FALSE},where sch(p) is the minimum set of attributes such that for all tuplest₁ and t₂ :

    (t.sub.1  sch(p)!=t.sub.2  sch(p)!p(t.sub.1)=p(t.sub.2)).

For a tuple t with real schema R.OR left.sch(p), p(t) is TRUE if andonly if (∀A εsch(p)) (i.e., substitution of t A! for A in p causes it toevaluate to TRUE).

Null-intolerant

A predicate p is null-intolerant if p evaluates to FALSE for tuplesundefined on one or more attributes in sch(p). More formally, p isnull-intolerant if:

    (∀t) (∃Aεsch(p))(t A!=NULL p(t)=FALSE)

Algebraic Operators

Following are definitions for algebraic operators used in SQL queries.These definitions are required for an understanding of later portions ofthe present specification.

Projection

The projection, π^(a) _(x) (r), of relation r onto attributes X is therelation (X,V, E') where X.OR right.R and:

    E'={t.v|(∃t'εE)(t=t' X!v=t' V!)}

The π^(a) operator is a projection operator that does not remove"duplicates" in the real attributes part of the source expression. Thesuperscript a in π^(a) denotes the fact that all the virtual attributesof the source expression are included in the virtual schema of theresult expression. For ease of reading, the superscript a is omittedfrom π^(a) whenever there is no ambiguity, so it can be written simplyas π.

The projection, π^(c) _(X).sbsb.R_(X).sbsb.V (r), of relation r onattributes X_(R) X_(V) is the relation (X_(R), X_(V), E'), where X_(R).OR right.R, X_(V) .OR right.V and:

    E'={t.v|(∃t'εE)(t=t' X.sub.R !v=t' X.sub.v !)}

In contrast to π, π^(c) allows a choice in the selection of the virtualattributes from the source expression.

Delta-Projection

The delta-projection, δ_(X).sbsb.R_(X).sbsb.V (r), of relation r onattributes X_(R) X_(V) is the relation (X_(R) X_(V), V_(new), E'), whereX_(R) .OR right.R, X_(V) .OR right.V, and:

    E'={t|(∃t'εE)(t X.sub.R X.sub.V !=t' X.sub.R X.sub.V !t V.sub.new !=a new unique value)}

which is a new, unique value. The δ operator models the "SELECT DISTINCT. . . " construct of SQL which allows elimination of "duplicates" from arelation. The δ operator is called the distinct projection operator andproduces a result relation which has distinct values on the attributesX_(R) X_(V) and a new virtual attribute.

Selection

The selection, σ_(p) (r), of relation r on predicate p is the relation(R, V, E'), where sch(p).OR right.R, and:

    E'={t|(tεE)p(t)}

Union Compatible

A relation r₁ =({A₁, A₂, . . . , A_(n) }, V, E₁) is said to be unioncompatible with relation R₂ =({B₁, B₂, . . . , B_(n) }, V, E₂) if thereexists a permutation p of the numbers 1, 2, . . . , n such that domain(A₁)=domain (B_(pi)), for 1≦i≦n. That is, the attributes of r₁ and r₂can be ordered such that the domains of the first attributes of r₁ andr₂ are the same, the domains of the second attributes of r₁ and r₂ arethe same, and so on.

Union and Outer Union

The union, r₁ ∪r₂, of relations r₁ and r₂ is the relation (R, V, E₁∪E₂). The outer union, r₁ r₂, is the relation (R₁ ∪R₂, v₁ ∪v₂, E'),where:

    E'={t|(∃t'εE.sub.1)(t R.sub.1 V.sub.1 !=t(∀aε(R.sub.2 -R.sub.1)∪(v.sub.2 -v.sub.1))(t A!=NULL))(∃t"εE.sub.2)(t R.sub.2 V.sub.2 !=t"(∃Aε(R.sub.1 -R.sub.2)∪(v.sub.1 -v.sub.2))(t A!=NULL))}

Note that rows in r₁ r₂ are padded with nulls for those attributes thatare not present either in relation r₁ or in relation r₂.

Additional Algebraic Operators

In the following definitions, it is assumed that if predicate p isassociated with join/outer/full outer join of relations r₁ and r₂ suchthat sch(p)∩R₁ ≠.o slashed., sch(p)∩R₂ ≠.o slashed., and sch(p).ORright.R₁ ∪R₂.

Join

The join, ##EQU11## of relations r₁ and r₂ is the relation (R₁ R₂, V₁V₂, E'), where:

    E'={t|tε(E.sub.1 ×E.sub.2)p(t)}

Left and Right Outer Joins

The left outer join, ##EQU12## is the relation (R₁ R₂, V₁ V₂, E'),where: ##EQU13## Relation r₁ in the above definition is called thepreserved relation and relation r₂ is called the null supplyingrelation. The right outer join, ##EQU14## can similarly be defined inwhich r₁ is the null supplying relation and r₂ is the preservedrelation.

Full Outer join

The full outer join, ##EQU15## of relations r₁ and r₂ is the relation(R₁ R₂, V₁ V₂, E'), where: ##EQU16## Equivalence

Values v₁ and v₂ are said to be equivalent, which is denoted as v₁ ≡v₂,if both v₁, v₂ are non-NULL and v₁ =v₂, or if both v₁, v₂ are NULL.

Intersect Distinct

Let r₁ =({A₁, A₂, . . . , A_(n) }, V, E₁) and r₂ =({B₁, B₂, . . . ,B_(n) }, V, E₂) be two union compatible relations. Then, the intersectdistinct, r₁ ∩_(d) r₂, of r₁ and r₂ is the relation ({C₁, C₂, . . . ,C_(n) }, V, E), where each C_(i) is a possibly renamed version of theunion compatible attribute pair (A_(i), B_(i)), 1≦i≦n, and

    E={t|(∃t.sub.1 εE.sub.1)(∃t.sub.2 εE.sub.2)(∀i)(t C.sub.i !≡t.sub.1  A.sub.i !t C.sub.i !≡t.sub.2  B.sub.i !)t V! is a new unique value}

The intersect distinct operation retains the common tuples in relationsr₁ and r₂. If a tuple t₁ εE₁ contains null value in attribute A_(i) andt₂ εE₂ contains null value in attribute B_(i), 1≦i≦n, and identicalnon-null values in the remaining attributes, then t₁ and t₂ areconsidered equivalent and only a single copy is retained in the result.In case there are duplicate tuples in either of the operands, only onecopy of the common tuple is retained in the result. In contrast, theintersect all operator, denoted as ∩_(a), retains "some" of theduplicate copies of the common tuples, subject to the "minimum rule".More precisely, in two union compatible relations r₁ and r₂, if a tuplet appears i times in r₁ and j times in r₂, then t appears min {i,j}times in the result relation r₁ ∩_(a) r₂.

Expressions and expression trees

The following provides a recursive definition of expressions.

1. If r=(R, V, E) is a relation, then r is an expression. Henceforth,the shorthand notation X will be used to represent the expressionX=(R_(x), V_(x), E_(x)),

2. If X=(R_(x), V_(x), E_(x)) is an expression, then π^(a) _(x), (e) isan expression, where X'.OR right.R_(x).

3. If X=(R_(x), V_(x), E_(x)) is an expression, thenδ_(X).sbsb.R_(X).sbsb.V (e) is an expression, where X_(R) .ORright.R_(x) and X_(v) .OR right.V_(x).

4. If X=(R_(x), V_(x), E_(x)) is an expression, then σ_(p) (X) is anexpression, where sch(p).OR right.R_(x).

5. If X=(R_(x), V_(x), E_(x)) and Y=(R_(y), V_(y), E_(y)) areexpressions, then ##EQU17## is an expression, where: ##EQU18## and p isa predicate such that sch(p)∩R_(x) ≠.o slashed., sch(p)∩R_(y) ≠.oslashed., and sch(p).OR right.R_(X) ∪R_(Y).

6. If X=(R_(x), V_(x), E_(x)) is an expression, then so is (X), where(X)=(R_(x), V_(x), E_(x)). This is termed parenthesization, and isrequired due to the fact that some of the binary operations definedabove are not fully associative. Parenthesization determines theevaluation order so that expressions can be evaluated unambiguously.However, whenever there is no ambiguity, parentheses will be droppedfreely.

An expression can also be represented by a corresponding expression treein which the inner nodes are the operators occurring in the expressionand the leaves are the base relations referenced in the expression. Let##EQU19## denote one of the binary operators defined in the previoussection, then an expression tree T with left sub-tree T₁, right sub-treeT_(r) and root ##EQU20## is denoted by: ##EQU21## Henceforth, the twoequivalent representations are used interchangeably.

OVERVIEW

The present invention discloses a method and apparatus forsimplification of SQL queries using generalized inference propagation.The presence of intersection operations in queries permits the presentinvention to propagate inferences about one operand of the intersectionoperation to the other operand, in order to simplify the query bytransforming outer join operations to less expensive joins. Further, byapplying selection operators through transitive closure, the presentinvention can make intermediate results of SQL queries smaller. Finally,by applying "null-intolerance" properties of join predicates in thepresence of intersection operators, the present invention can furthersimplify SQL queries.

The first major aspect of the present invention is the use of"generalized inference propagation" to simplify SQL queries. Anintersection operator in an SQL query produces tuples that belong toboth its input operands. Two tuples are deemed to be equal if, for everyattribute, they have either the same non-null value or are both null.This property of the intersection operator, along with the"null-intolerant" nature of (outer) join predicates, allows the presentinvention to simplify queries by replacing expensive outer joinoperations with cheaper inner joins, etc. Also, since the intersectionoperator retains tuples that exist in both its operands, the presentinvention can apply any known assertion about one of the operands to theother. For example, if it is known that one of the operands of theintersection operator has non-null values for an attribute, then alltuples from the other operand that contain nulls in this attribute canbe "filtered" away by appropriate selection or by changing anintermediate operator.

The second major aspect of the present invention is the use of"generalized transitive closure" to simplify SQL queries. In the priorart, optimizers use a technique known as "predicate transitive closure"for the purpose of introducing additional, logically correct, selectionpredicates into the query expression. This helps in reducing the size ofthe intermediate relations and, consequently, the query evaluation cost.However, for queries that include an outer join operation, theequivalence-class-based transitive closure method does not work.Further, unlike join-only queries, it is not always correct for outerjoin operations to push the selection predicate down close to therelation on which the selection predicate is specified. For example:##EQU22## However, the present invention introduces cost-reducing,logically correct selections into query expressions involving (full)outer joins. The present invention uses such generalized transitiveclosure to simplify SQL queries.

HARDWARE ENVIRONMENT

FIG. 1 illustrates an exemplary computer hardware environment that couldbe used with the present invention. In the exemplary environment, acomputer system 102 is comprised of one or more processors connected toone or more electronic storage devices 104 and 106, such as disk drives,that store one or more relational databases.

Operators of the computer system 102 use a standard operator interface108, such as IMS/DB/DC, CICS, TSO, OS/2 or other similar interface, totransmit electrical signals to and from the computer system 102 thatrepresent commands for performing various search and retrievalfunctions, termed queries, against the databases. In the presentinvention, these queries conform to the Structured Query Language (SQL)standard, and invoke functions performed by Relational DataBaseManagement System (RDBMS) software. In the preferred embodiment of thepresent invention, the RDBMS software comprises the DB2 product offeredby IBM for the MVS or OS/2 operating systems. Those skilled in the artwill recognize, however, that the present invention has application toany RDBMS software that uses SQL.

As illustrated in FIG. 1, the DB2 architecture for the MVS operatingsystem includes three major components: the IMS Resource Lock Manager(IRLM) 110, the Systems Services module 112, and the Database Servicesmodule 114. The IRLM 110 handles locking services, because DB2 treatsdata as a shared resource, thereby allowing any number of users toaccess the same data simultaneously, and thus concurrency control isrequired to isolate users and to maintain data integrity. The SystemsServices module 112 controls the overall DB2 execution environment,including managing log data sets 106, gathering statistics, handlingstartup and shutdown, and providing management support.

At the center of the DB2 architecture is the Database Services module114. The Database Services module 114 contains several submodules,including the Relational Database System (RDS) 116, the Data Manager118, the Buffer Manager 120 and other components 122 such as an SQLcompiler/interpreter. These submodules support the functions of the SQLlanguage, i.e., definition, access control, retrieval, and update ofuser and system data.

INTERACTIVE SQL EXECUTION

FIG. 2 is a flowchart illustrating the steps necessary for theinterpretation and execution of SQL statements in an interactiveenvironment according to the present invention. Block 202 represents theinput of SQL statements into the computer system 102 from the user.Block 204 represents the step of compiling or interpreting the SQLstatements. An optimization function within block 204 may transform theSQL query in a manner described in more detail later in thisspecification. Block 206 represents the step of generating a compiledset of runtime structures called an application plan from the compiledSQL statements. Generally, the SQL statements received as input from theuser specify only the data that the user wants, but not how to get toit. This step considers both the available access paths (indexes,sequential reads, etc.) and system held statistics on the data to beaccessed (the size of the table, the number of distinct values in aparticular column, etc.), to choose what it considers to be the mostefficient access path for the query. Block 208 represents the executionof the application plan, and block 210 represents the output of theresults of the application plan to the user.

EMBEDDED/BATCH SQL EXECUTION

FIG. 3 is a flowchart illustrating the steps necessary for theinterpretation and execution of SQL statements embedded in source codeaccording to the present invention. Block 302 represents program sourcecode containing a host language (such as COBOL or C) and embedded SQLstatements. The program source code is then input to a pre-compile step304. There are two outputs from the pre-compile step 304: a modifiedsource module 306 and a Database Request Module (DBRM) 308. The modifiedsource module 306 contains host language calls to DB2, which thepre-compile step 304 inserts in place of SQL statements. The DBRM 308consists of the SQL statements from the program source code 302. Acompile and link-edit step 310 uses the modified source module 306 toproduce a load module 312, while an optimize and bind step 314 uses theDBRM 308 to produce a compiled set of runtime structures for theapplication plan 316. As indicated above in conjunction with FIG. 2, theSQL statements from the program source code 302 specify only the datathat the user wants, but not how to get to it. The optimize and bindstep 314 may reorder the SQL query in a manner described in more detaillater in this specification. Thereafter, the optimize and bind step 314considers both the available access paths (indexes, sequential reads,etc.) and system held statistics on the data to be accessed (the size ofthe table, the number of distinct values in a particular column, etc.),to choose what it considers to be the most efficient access path for thequery. The load module 312 and application plan 316 are then executedtogether at step 318.

SQL QUERY OPTIMIZATION

FIG. 4 is a flowchart illustrating the method of optimizing SQL queriesin steps 204 of FIG. 2 and 314 of FIG. 3 according to the presentinvention. Block 402 represents the acceptance of the SQL query. Block404 represents the translation of the query into an expression tree.Block 406 represents the RDBMS software performing the generalizedinference propagation, as described in more detail below in conjunctionwith FIGS. 5A-5H. Block 408 represents the RDBMS software performing thegeneralized transitive closure, as described in more detail below. Afterthese query transformation steps are performed, block 410 returnscontrol to block 204 in FIG. 2 or block 314 in FIG. 3 for subsequentprocessing steps, including the execution of the SQL query against therelational database and the output of the result table to the user.

GENERALIZED INFERENCE PROPAGATION

FIGS. 5A-5H together are a flowchart describing the generalizedinference propagation method that determines which of the "component"attribute pairs of the "p* intersection-predicate" become join-reducedand then uses simplification identities to simplify the query.

For example, consider the relations r₁, r₂, r₃, and r₄ having schemas L₁L₂, L₃ L₄, R₁, and R₂ R₃ R₄, respectively. Then, an application ofinference propagation according to the present invention would simplifya query expression like: ##EQU23## The latter expression can then bemade to execute even more efficiently by applying generalized transitiveclosure of selection operators (as described in more detailhereinafter).

Strong Attributes

An attribute A is said to be strong if all tuples defined on schemascontaining A have non-null values for attribute A. Typically, attributesare strong if they are referenced through null-intolerant selection orjoin predicates. For example, attributes A₁ and B₁ in the relationalexpression ##EQU24## are strong. Strongly Bound Attributes

A real attribute A is said to be strongly bound if all tuples defined onschemas containing A have exactly the same non-null value for attributeA. Typically, an attribute A in the real schema of expression σ getsstrongly bound when σ_(p) (e) assigns a constant value to attributeAεsch(p).

Notation

With regard to notation, strongSet(r) is used to denote the set ofstrong/strongly bound attributes in the (sub) expression represented by(sub)tree T. In addition, if ⊙_(r) is the root of the (sub)treecorresponding to (sub)expression e, then the shorthand notationsch(⊙_(r)) is used to represent sch(e). Finally, for any attribute A inthe schema of one of the operands of ∩_(d), ∩_(a), the notationCompat(A) is used to represent the corresponding union-compatibleattribute from the other operand.

Flowchart

Following is a detailed description of the flowchart of FIGS. 5A-5H,which performs query expression simplification via inference propagationin the expression tree for a given query expression.

Referring first to FIG. 5A, block 502 represents the acceptance of inputinto the routine, i.e., from block 406 in FIG. 4. The input comprises aroot of an expression (sub)tree T containing unary operators {δ,π,σ} andbinary operators {,→,←, ⃡, ∩_(d),∩_(a) }. In the expression sub(tree),it is assumed that all selections have been pushed down as close to thebase relations as possible. Further, it is assumed that the query hasundergone query simplification to convert unnecessary outer joins tojoins, etc., as described in BHAR94!, which is incorporated by referenceherein.

Let ⊙_(r) be the root of T. The inference propagation method does apost-order traversal of the tree as described in the following steps.

Block 504 is a decision block that determines whether ⊙_(r) is a leaf(i.e., a base relation) of the expression tree. If so, then control istransferred to block 506, which sets strongSet(T) to an empty set, andthen to block 508, which terminates the routine and returns to thecalling routine. Otherwise control is transferred to block 510.

Block 510 is a decision block that determines whether ⊙_(r) =σ_(p) andits child ⊙_(c) is a base relation s=(R,V,E). If so, then control istransferred to block 512, which stores attribute A in strongSet(T) if Ais in sch(p) and p strongly binds A, and then to block 508, whichterminates the routine and returns to the calling routine. Otherwise,control is transferred to block 514.

Block 514 is a decision block that determines whether ⊙_(r)ε{δ_(X),π_(X) }, wherein ⊙_(c) is its child. If so, then control istransferred to block 516, which stores strongSet(⊙_(c))∩X intostrongSet(T), and then to block 508, which terminates the routine andreturns to the calling routine. Otherwise, control is transferred toFIG. 5B via "A".

Referring now to FIG. 5B, block 518 is a decision block that determineswhether ##EQU25## is a binary operator ε{,→, ←→}, wherein ⊙_(L) and⊙_(R) are its left and right children, respectively. If not then controlis transferred to FIG. 5F via "B"; otherwise, control is transferred toblock 520.

Block 520 is a decision block that determines whether ⊙_(r) =. If not,then control is transferred FIG. 5C via "C". Otherwise, control istransferred to block 522.

Block 522 computes the strongSet(⊙_(L)) by recursively invoking thisinference propagation routine for the left child ⊙_(L). Block 524computes the strongSet(⊙_(R)) by recursively invoking this inferencepropagation routine for the right child ⊙_(R). Block 526 computes thestrongSet(T) by taking the union of the above two strongSets and allattributes in the join predicates, i.e., by storing strongSet(⊙_(L)) ∪strongset (⊙_(R)) ∪ (all attributes mentioned in p_(LR)) intostrongSet(T). Block 528 terminates the routine and returns control tothe calling routine.

Referring now to FIG. 5C, block 530 is a decision block that determineswhether ⊙_(r) =→. If not, then control is transferred to FIG. 5D via"D". Otherwise, control is transferred to block 532.

Block 532 computes the strongSet(⊙_(L)) by recursively invoking thisinference propgation routine. Block 534 stores strongSet(⊙_(L)) intostrongSet(T). Block 536 terminates the routine and returns control tothe calling routine.

Referring now to FIG. 5D, block 538 is a decision block that determineswhether ⊙_(r) =←. If not, then control is transferred to FIG. 5E via"E". Otherwise, control is transferred to block 540.

Block 540 computes the strongSet(⊙_(R)) by recursively invoking thisinference propgation routine. Block 542 stores strongSet(⊙_(R)) intostrongSet(T). Block 544 terminates the routine and returns control tothe calling routine.

Referring now to FIG. 5E, block 546 is a decision block that determineswhether ⊙_(r) =⃡. If not, then control is transferred to FIG. 5F via "B".Otherwise, control is transferred to block 548.

Block 548 sets strongSet(T) to an empty set. Block 550 terminates theroutine and returns control to the calling routine.

Referring now to FIG. 5F, block 552 is a decision block that determineswhether ⊙_(r) ε{∩_(d),∩_(a) }, wherein ⊙_(L) and ⊙_(R) are its left andright children, respectively. If not, then control is transferred toblock 554, which terminates the inference propagation and returns to thecalling routine. Otherwise, control is transferred to block 556.

Block 556 stores attribute A into simplifySet(T) if A ε strongSet(⊙_(L))and Compat(A) .epsilon slash. strongSet(⊙_(R)), or if A εstrongSet(⊙_(L)) and Compat(A) .epsilon slash. strongSet (⊙_(R)). Thus,simplifySet(T) is a set of those attributes that are strong or stronglybound in exactly one of the two operands of the intersect operator.

Block 558 is a sequential "for" loop that is executed once for everyattribute A ε simplifySet(T). Block 558 transfers control to block 560as the first step of the loop for each attribute. Otherwise, controltransfers to FIG. 5G via "G" after all attributes A ε simplifySet(T)have been processed in the loop.

Block 560 recursively applies the simplification identities (1)-(9) tothe expression (sub)tree with predicate p* join-reduced in attributes Aand Compat(A), as described in more detail below. The simplificationidentities are applied to the tree to transform it into a modified tree,the identities are then re-applied to the modified tree, and the processcontinues until none of the simplification identities can be applied tothe tree. Control is then returned to block 558 to complete the loop.

Referring now to FIG. 5G, block 562 is a decision block that determineswhether simplifySet(T)≠.o slashed.. If not, then control is transferredto FIG. 5H via "H". Otherwise, control is transferred to block 564.

Block 564 recomputes strongSet (⊙_(L)) by recursively invoking thisinference propagation routine with (⊙_(L)) as its input. Block 566recomputes strongSet(⊙_(R)) by recursively invoking this inferencepropagation routine with (⊙_(R)) as its input.

Referring now to FIG. 5H, block 568 is a decision block that determineswhether simplifySet(T)=.o slashed.. If not, then control is transferredto FIG. 5F via "F" to repeat the above steps. Otherwise, control istransferred to block 570.

Block 570 stores the renamed version of strongSet(⊙_(L)) orstrongSet(⊙_(R)) into strongSet(T), where the renaming is done tocorrespond to the attribute names in the result of ∩_(d) or ∩_(a).Finally, block 572 terminates the routine and returns control to thecalling routine.

Notes On Inference Propagation

The step of the inference propagation method that handles {∩_(d),∩_(a) }operators, computes, in some sense, a "fixed point" solution of thestrong set. The number of iterations in this fixed point solution arebounded by the number of arrows (counting a one sided outer joins as onearrow, and full outer join as two arrows) in the (sub) tree rooted atthe intersection operator. Since this is a finite number, it alsoprovides proof that the method steps will eventually terminate.

In one possible embodiment of the present invention, the execution ofthe inference propagation method could be made more efficient. Forexample, every time strongSet(⊙_(L)) or strongSet(⊙_(R)) is re-computed,the computation can be restricted to the paths that are affected by theprevious simplification, instead of performing a complete traversal ofthe subtree. This could, for example, be accomplished by keeping a listof paths from the intersection operator to the leaf that contains theouter join operations that are candidates for simplification.

Example of Inference Propagation

Following is an example of inference propagation. Consider the followingquery involving relations A, B, C, D, E, F, and G having attributes A₁A₂ A₃ A₄ A₅, B₁, C₁, D₁, E₁, F₁ F₂, and G₁, respectively, where theunion-compatible attribute pairings are A₁ C₁, A₂ D₁, A₃ E₁, A₄ F₁, A₅F₂, B₁ G₁, C₁ F₁, D₁ F₂, and E₁ G₁ : ##EQU26##

Then, an application of inference propagation would determine thefollowing, incrementally.

Since A₃ and B₁ are strong and A₃ E₁ is a union-compatible pair, ∩_(a)would cause: ##EQU27## to become: ##EQU28##

Since C₁ is now strong and C₁ F₁ is a union-compatible pair, ∩_(a) wouldcause: ##EQU29## to become: ##EQU30##

Since F₂ is now strong and D₁ F₂ is a union-compatible pair, ∩_(a) wouldcause: ##EQU31## to become: ##EQU32##

As a result, the query simplifies to: ##EQU33##

SIMPLIFICATION IDENTITIES

Following are the simplification identities applied at block 542 forquery expressions involving joins, outer joins, full outer joins, andintersection operators.

Let X_(i) =(R_(i), V_(i), E_(i)), where 1≦i≦4, be relationalexpressions, p be a null-intolerant predicate, and ∩ε{∩_(a),∩_(d) }.Then, the following simplification identities can be used, if theassociated criteria are met.

Simplification Identity (1) ##EQU34## where sch(p) ∩R₃ ≠.o slashed..Simplification Identity (2) ##EQU35## where sch(p) ∩R₂ ≠.o slashed..Simplification Identity (3) ##EQU36## where sch(p)∩R₂ ≠.o slashed. andsch(p)∩R₃ ≠.o slashed..

Simplification Identity (4) ##EQU37## where sch(p)∩R₃ ≠.o slashed..Simplification Identity (5) ##EQU38## where sch(p)∩R₃ ≠.o slashed..Simplification Identity (6) ##EQU39## where sch(p)∩R₂ ≠.o slashed. andsch(p)∩R₃ ≠.o slashed..

Corresponding Simplification Identities

Corresponding identities which simplify the right operand of theintersection operator are also used. All of these identities are appliedrecursively.

Additional Simplification Identities

The intersection operator has two union-compatible operands. Withoutloss of generality, let the schemas of the two operands be L₁ L₂ . . .L_(n) and R₁ R₂ . . . R_(n), where each of the L_(i) R_(i) pairs areunion compatible, 1≦i≦n. The definition of intersection distinct can bere-stated as:

    r.sub.1 ∩.sub.d r.sub.2 ={t.sub.1 εr.sub.1 |(∃t.sub.2 εr.sub.2)(∀i) (1≦i≦np*.sub.L.sbsb.i.sub.R.sbsb.i is TRUE)}

where P*_(L).sbsb.i_(R) _(i) is the predicate:

    (t.sub.1  L.sub.i ! is not NULL t.sub.2  R.sub.i ! is not NULL t.sub.1  L.sub.i !=t.sub.2  R.sub.i !)(t.sub.1  L.sub.i ! is NULL t.sub.2  R.sub.i ! is NULL)

Or, equivalently:

    r.sub.1 ∩.sub.d r.sub.2 ={t.sub.2 εr.sub.2 |(∃t.sub.1 εr.sub.1)(∀i)(1≦i≦np*.sub.L.sbsb.i.sub.R.sub.i is TRUE)}

Also, p*_(L).sbsb.i_(R).sbsb.i =p*_(L).sbsb.i_(R).sbsb.I for 1≦i≦n.

In this context, the operator "=" is used for non-NULL values, alongwith the predicates "is NULL" and "is not NULL".

For ∩_(a), the same predicate condition determines which tuples areincluded in the output, although duplicate tuples are permitted in theresult, subject to the "minimum rule".

In the definition of p*_(L).sbsb.i_(R).sbsb.i, if it can be determinedthat either L_(i) or R_(i), or both, are guaranteed to have non-nullvalues, then the predicate reduces to:

    t.sub.1  L.sub.i !=t.sub.2  R.sub.i !

Such a predicate is termed "join-reduced" in attributes L_(i) and R_(i).For such join-reduced predicates, the following novel simplificationidentities can be applied.

Let X_(i) =(R_(i), V_(i), E_(i)), where 1≦i≦3, be relationalexpressions, ∩ε{∩_(a),∩_(d) }, and let p* be a join-reduced predicate.Then, the following simplification identities can be used, if theassociated criteria are met.

Simplification Identity (7) ##EQU40## if p* is join-reduced in anattribute in R₃. Simplification Identity (8) ##EQU41## if p* isjoin-reduced in an attribute in R₃. Simplification Identity (9)##EQU42## if p* is join-reduced in an attribute in R₂ and R₃.Corresponding Simplification Identities

Corresponding identities which simplify the left operand of theintersection operator are also used. All these identities are appliedrecursively.

GENERALIZED TRANSITIVE CLOSURE

The following identities are applied at block 408 of FIG. 4 andconstitute a pushup and pushdown rules. The first one of these is aknown result for equi-join predicates; the present invention applies topredicates in general. A recursive application of these identities atblock 408 of FIG. 4 results in generalized transitive closure.

Let X=(R_(X), V_(X), E_(X)), Y=(R_(Y), V_(Y), E_(Y)), and Z=(R_(Z),V_(Z), E_(Z)) be relational expressions, P_(X) be a selection predicateon X, P_(Y) be a selection predicate on Y, and J_(XY) be anull-intolerant join/outer join predicate. Then, the followinggeneralized transitive closure identities can be used, if the associatedcriteria are met.

Generalized Transitive Closure Identity (1) ##EQU43## if p_(X) J_(XY)p_(Y). In the above if J_(XY) is an equi-join predicate, then theimplication can be replaced by equivalence.

Generalized Transitive Closure Identity (2) ##EQU44## if p_(X) J_(XY)p_(Y). In the above, if J_(XY) is an equi-join predicate, then theimplication can be replaced by equivalence.

Generalized Transitive Closure Identity (3) ##EQU45## if p_(X) J_(XY)p_(Y). In the above, if J_(XY) is an equi-join predicate, then theimplication can be replaced by equivalence.

Generalized Transitive Closure Identity (4) ##EQU46## (from GALI92b!).Further: ##EQU47## if p_(Y) J_(XY) p_(X). Generalized Transitive ClosureIdentity (5)

Let p be a predicate on the output attributes of an intersectionoperation (∩_(d),∩_(a)), and let p_(X) and p_(Y) be obtained from p byrenaming the attributes referenced in p to the corresponding attributesin the operands X and Y, respectively. Then:

    σ.sub.p (X∩Y)=σ.sub.p.sbsb.X (X)∩σ.sub.p.sbsb.Y (Y)

for ∩ε{∩_(d), ∩_(a) }, and:

    σ.sub.p.sbsb.X (X)∩Y=σ.sub.p (X∩Y)

for ∩ε{∩_(d), ∩_(a) }.

Generalized Transitive Closure Identity (6) ##EQU48## where p isobtained by replacing each of the attributes in J_(XY) by theircorresponding union-compatible attributes from Z, and ∩ε(∩_(d), ∩_(a)).If any of the attributes in J_(XY) are not part of the schema of theexpression: ##EQU49## (e.g., removed through projection), then there isno corresponding union-compatible attributes from Z, and thetransformation does not apply.

Additionally, using the identities for σ-pushdown described herein, theexpression σ_(p) (Z) could be simplified further. This transformationwould be applied after the inference propagation method has been used tosimplify the query.

CONCLUSION

This concludes the description of the preferred embodiment of theinvention. The following describes some alternative embodiments foraccomplishing the present invention. For example, any type of computer,such as a mainframe, minicomputer, or personal computer, could be usedwith the present invention. In addition, any software program adhering(either partially or entirely) to the SQL language could benefit fromthe present invention.

In summary, the present invention discloses a method and apparatus forquery simplification using generalized inference propagation andgeneralized transitive closure in SQL queries having selection,projection, join, outer join, and intersection operations.

The foregoing description of the preferred embodiment of the inventionhas been presented for the purposes of illustration and description. Itis not intended to be exhaustive or to limit the invention to theprecise form disclosed. Many modifications and variations are possiblein light of the above teaching. It is intended that the scope of theinvention be limited not by this detailed description, but rather by theclaims appended hereto.

What is claimed is:
 1. A method of simplifying an SQL query in acomputer having a memory, the SQL query being performed by the computerto retrieve data from a relational database stored in a electronicstorage device coupled to the computer, the method comprising the stepsof:(a) iteratively calculating, in the memory of the computer, a strongset of attributes referenced in the query; and (b) iterativelyreplacing, in the memory of the computer, full, left, and right outerjoin operations found in one or more operands of an intersectionoperation in the query with outer and inner join operations using thestrong set of attributes until no more simplification is possible. 2.The method as set forth in claim 1 above, wherein an attribute A is amember of the strong set of attributes when all tuples defined onschemas containing A have non-null values for attribute A.
 3. The methodas set forth in claim 1 above, wherein an attribute A is a member of thestrong set of attributes when the attribute A is referenced throughnull-intolerant selection and join predicates.
 4. The method as setforth in claim 1 above, wherein an attribute A is a member of the strongset of attributes when all tuples defined on schemas containing A haveexactly the same non-null value for attribute A.
 5. The method as setforth in claim 1 above, wherein an attribute A is a member of the strongset of attributes when a selection operator referencing the attribute Aassigns a constant value to the attribute A.
 6. The method as set forthin claim 1 above, wherein the iteratively replacing step comprises thestep of replacing a first expression in the query with a secondexpression, wherein the first expression comprises: ##EQU50## and thesecond expression comprises: ##EQU51## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₃ ≠.o slashed., is a join operator, → is a left outerjoin operator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 7. The method as set forth inclaim 1 above, wherein the iteratively replacing step comprises the stepof replacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU52## and the secondexpression comprises: ##EQU53## wherein X_(i) =(R_(i), V_(i), E_(i)) arerelational expressions for 1≦i≦4, p is a null-intolerant predicate,sch(p)∩R₂ ≠.o slashed., → is a left outer join operator, ⃡ is a fullouter join operator, ∩_(a) is an intersect all operator, ∩_(d) is anintersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 8. The method as setforth in claim 1 above, wherein the iteratively replacing step comprisesthe step of replacing a first expression in the query with a secondexpression, wherein the first expression comprises: ##EQU54## and thesecond expression comprises: ##EQU55## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₂ ≠.o slashed., sch(p)∩R₃ ≠.o slashed., is a joinoperator, ⃡ is a full outer join operator, ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.9. The method as set forth in claim 1 above, wherein the iterativelyreplacing step comprises the step of replacing a first expression in thequery with a second expression, wherein the first expression comprises:##EQU56## and the second expression comprises: ##EQU57## wherein X_(i)=(R_(i), V_(i), E_(i)) are relational expressions for 1≦i≦4, p is anull-intolerant predicate, sch(p)∩R₃ ≠.o slashed., is a join operator, →is a left outer join operator, ∩_(a) is an intersect all operator, ∩_(d)is an intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 10. The methodas set forth in claim 1 above, wherein the iteratively replacing stepcomprises the step of replacing a first expression in the query with asecond expression, wherein the first expression comprises: ##EQU58## andthe second expression comprises: ##EQU59## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₃ ≠.o slashed., → is a left outer join operator, ⃡ isa full outer join operator, ∩_(a) is an intersect all operator, ∩_(d) isan intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 11. The method asset forth in claim 1 above, wherein the iteratively replacing stepcomprises the step of replacing a first expression in the query with asecond expression, wherein the first expression comprises: ##EQU60## andthe second expression comprises: ##EQU61## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₂ ≠.o slashed., sch(p)∩R₃ ≠.o slashed., is a joinoperator, → is a left outer join operator, ⃡ is a full outer joinoperator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 12. The method as set forth inclaim 1 above, wherein the iteratively replacing step comprises the stepof replacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU62## and the secondexpression comprises: ##EQU63## wherein X_(i) =(R_(i), V_(i), E_(i)) arerelational expressions for 1≦i≦4, p* is a join reduced predicate in anattribute in R₃, is a join operator, → is a left outer join operator,∩_(a) is an intersect all operator, ∩_(d) is an intersect distinctoperator, and ∩ε{∩_(a),∩_(d) }.
 13. The method as set forth in claim 1above, wherein the iteratively replacing step comprises the step ofreplacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU64## and the secondexpression comprises: ##EQU65## wherein X_(i) =(R_(i), V_(i), E_(i)) arerelational expressions for 1≦i≦4, p* is a join reduced predicate in anattribute in R₂, → is a left outer join operator, ⃡ is a full outer joinoperator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 14. The method as set forth inclaim 1 above, wherein the iteratively replacing step comprises the stepof replacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU66## and the secondexpression comprises: ##EQU67## wherein X_(i) =(R_(i), V_(i), E_(i)) arerelational expressions for 1≦i≦4, p* is a join reduced predicate in anattribute in R₂ and R₃, is a join operator, ⃡ is a full outer joinoperator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 15. The method as set forth inclaim 1 above, further comprising the steps of:(1) iterativelycalculating a strong set of attributes referenced in a sub-expression ofthe query; and (2) iteratively replacing full, left, and right outerjoin operations in the sub-expression with outer and inner joinoperations using the strong set of attributes for the sub-expressionuntil no more simplification is possible.
 16. A method of simplifying anSQL query in a computer having a memory, the SQL query being performedby the computer to retrieve data from a relational database stored in aelectronic storage device coupled to the computer, the method comprisingthe steps of:(a) generating an expression tree to represent the query inthe memory of the computer, the expression tree comprised of a set ofleaves corresponding to relations referenced in the query, and a set ofinner nodes corresponding to operators occurring in the query; (b)generating a strong set of attributes for the expression tree in thememory of the computer; (c) post-order traversing the expression tree inthe memory of the computer until a node representing an intersectionoperator is encountered; (d) simplifying a subtree of the expressiontree below the intersection operator in the memory of the computer; (e)re-generating the strong set of attributes for the subtree in the memoryof the computer; and (f) repeating steps (b)-(d) until the strong set ofattributes does not change.
 17. The method as set forth in claim 16above, wherein the strong set of attributes is comprised of one or moreattributes A, such that all tuples defined on schemas containing A havenon-null values for attribute A.
 18. The method as set forth in claim 16above, wherein the strong set of attributes is comprised of one or moreattributes A, wherein the attribute A is referenced throughnull-intolerant selection and join predicates.
 19. The method as setforth in claim 16 above, wherein the strong set of attributes iscomprised of one or more attributes A, such that all tuples defined onschemas containing A have exactly the same non-null value for attributeA.
 20. The method as set forth in claim 16 above, wherein the strong setof attributes is comprised of one or more attributes A, wherein aselection operator referencing the attribute A assigns a constant valueto the attribute A.
 21. The method as set forth in claim 16 above,wherein the simplifying step comprises replacing an outer join operationwith a join operation.
 22. The method as set forth in claim 16 above,wherein the simplifying step comprises replacing a full outer joinoperation with a one-sided outer join operation.
 23. A method ofsimplifying an SQL query in a computer having a memory, the SQL querybeing performed by the computer to retrieve data from a relationaldatabase stored in a electronic storage device coupled to the computer,the method comprising the steps of:(a) examining the query in the memoryof the computer to identify instances where a selection operator isapplied to a result of a sub-expression, wherein the sub-expressioncomprises a first operator applied to first and second operands; (b)identifying, in the memory of the computer, classes of attributes forthe selection operator and the sub-expression that are equated throughthe first operator in the sub-expression; (c) generating, in the memoryof the computer, selection predicates from the identified classes; and(d) applying, in the memory of the computer, selection operators to thefirst and second operands using the generated selection predicatesbefore the first operator is performed, thereby making intermediateresults from the first operator smaller.
 24. The method of claim 23above, wherein the first operator is selected from a group comprising:(1) a join operator, (2) a full outer join operator, (3) a one-sidedouter join operator, and (4) an intersect operator.
 25. The method ofclaim 23 above, wherein the applying step further comprises the step ofreplacing: ##EQU68## when p_(X) J_(XY) P_(Y), wherein X=(R_(X), V_(X),E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) are relational expressions, p_(X) isa selection predicate on X, p_(Y) is a selection predicate on Y, J_(XY)is a null-intolerant join/outer join predicate, is a join operator, andσ is a selection operator.
 26. The method of claim 23 above, wherein theapplying step further comprises the step of replacing: ##EQU69## whenp_(x) J_(XY) p_(Y), wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y),E_(Y)) are relational expressions, p_(X) is a selection predicate on X,p_(Y) is a selection predicate on Y, J_(XY) is a null-intolerantjoin/outer join predicate, → is a left outer join operator, and σ is aselection operator.
 27. The method of claim 23 above, wherein theapplying step further comprises the step of replacing: ##EQU70## whenp_(X) J_(XY) p_(Y), wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y),E_(Y)) are relational expressions, p_(X) is a selection predicate on X,p_(Y) is a selection predicate on Y, J_(XY) is a null-intolerantjoin/outer join predicate, → is a left outer join operator, and σ is aselection operator.
 28. The method of claim 23 above, wherein theapplying step further comprises the step of replacing: ##EQU71## whenp_(Y) J_(XY) p_(X), wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y),E_(Y)) are relational expressions, p_(X) is a selection predicate on X,p_(Y) is a selection predicate on Y, J_(XY) is a null-intolerantjoin/outer join predicate, is a join operator, and σ is a selectionoperator.
 29. The method of claim 23 above, wherein the applying stepfurther comprises the step of replacing:

    σ.sub.p (X∩Y)

with:

    σ.sub.p.sbsb.X (X)∩σ.sub.p.sbsb.Y (Y)

wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) arerelational expressions, p is a predicate on output attributes of anintersection operation ∩ε(∩_(d),∩_(a)), ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p_(X) is a selectionpredicate on operand X, p_(y) is a selection predicate on operand Y,p_(X) and p_(Y) are obtained from p by renaming attributes referenced inp to corresponding attributes in the operands X and Y, respectively, isa join operator, and σ is a selection operator.
 30. The method of claim23 above, wherein the applying step further comprises the step ofreplacing:

    σ.sub.p.sbsb.X (X)∩Y

with:

    σ.sub.p (X∩Y)

wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) arerelational expressions, p is a predicate on output attributes of anintersection operation ∩ε(∩_(d) d,∩_(a)), ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p_(X) is a selectionpredicate on operand X, p_(Y) is a selection predicate on operand Y,p_(X) and p_(Y) are obtained from p by renaming attributes referenced inp to corresponding attributes in the operands X and Y, respectively, andσ is a selection operator.
 31. The method of claim 23 above, wherein theapplying step further comprises the step of replacing: ##EQU72## whereinX=(R_(X), V_(X), E_(X)), Y=(R_(Y), V_(Y), E_(Y)), and Z=(R_(Z), V_(Z),E_(Z)) are relational expressions, p is a predicate on output attributesof an intersection operation ∩ε(∩_(d),∩_(a)), ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p is obtained byreplacing each attribute in J_(XY) by their correspondingunion-compatible attributes from Z, is a join operator, and σ_(p) is aselection operator.
 32. An apparatus for simplifying an SQL query,comprising:(a) a computer having a memory and an electronic storagedevice coupled thereto, the data storage device storing a relationaldatabase; (b) means, performed by the computer, for accepting the SQLquery into the memory of the computer, the SQL query being executed bythe computer to retrieve data from a relational database stored in thecomputer; (c) means, performed by the computer, for iterativelycalculating a strong set of attributes referenced in the query; and (d)means, performed by the computer, for iteratively replacing full, left,and right outer join operations found in one or more operands of anintersection operation in the query with outer and inner join operationsusing the strong set of attributes until no more simplification ispossible.
 33. The apparatus as set forth in claim 32 above, wherein anattribute A is a member of the strong set of attributes when all tuplesdefined on schemas containing A have non-null values for attribute A.34. The apparatus as set forth in claim 32 above, wherein an attribute Ais a member of the strong set of attributes when the attribute A isreferenced through null-intolerant selection and join predicates. 35.The apparatus as set forth in claim 32 above, wherein an attribute A isa member of the strong set of attributes when all tuples defined onschemas containing A have exactly the same non-null value for attributeA.
 36. The apparatus as set forth in claim 32 above, wherein anattribute A is a member of the strong set of attributes when a selectionoperator referencing the attribute A assigns a constant value to theattribute A.
 37. The apparatus as set forth in claim 32 above, whereinthe means for iteratively replacing comprises means for replacing afirst expression in the query with a second expression, wherein thefirst expression comprises: ##EQU73## and the second expressioncomprises: ##EQU74## wherein X_(i) =(R_(i), V_(i), E_(i)) are relationalexpressions for 1≦i≦4, p is a null-intolerant predicate sch(p)∩R₃ ≠.oslashed., is a join operator, → is a left outer join operator, ∩_(a) isan intersect all operator, ∩_(d) is an intersect distinct operator, and∩ε{∩_(a),∩_(d) }.
 38. The apparatus as set forth in claim 32 above,wherein the means for iteratively replacing comprises means forreplacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU75## and the secondexpression comprises: ##EQU76## wherein X_(i) =R_(i), V_(i), E_(i)) arerelational expressions for 1≦i≦4, p is a null-intolerant predicate,sch(p)∩R₂ ≠.o slashed., → is a left outer join operator, ⃡ is a fullouter join operator, ∩_(a) is an intersect all operator, ∩_(d) is anintersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 39. The apparatus asset forth in claim 32 above, wherein the means for iteratively replacingcomprises means for replacing a first expression in the query with asecond expression, wherein the first expression comprises: ##EQU77## andthe second expression comprises: ##EQU78## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₂ ≠.o slashed., sch(p)∩R₃ ≠.o slashed., is a joinoperator, ⃡ is a full outer join operator, ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.40. The apparatus as set forth in claim 32 above, wherein the means foriteratively replacing comprises means for replacing a first expressionin the query with a second expression, wherein the first expressioncomprises: ##EQU79## and the second expression comprises: ##EQU80##wherein X_(i) =(R_(i), V_(i), E_(i)) are relational expressions for1≦i≦4, p is a null-intolerant predicate, sch(p)∩R₃ ≠.o slashed., is ajoin operator, → is a left outer join operator, ∩_(a) is an intersectall operator, ∩_(d) is an intersect distinct operator, and∩ε{∩_(a),∩_(d) }.
 41. The apparatus as set forth in claim 32 above,wherein the means for iteratively replacing comprises means forreplacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU81## and the secondexpression comprises: ##EQU82## wherein X_(i) =(R_(i), V_(i), E_(i)) arerelational expressions for 1≦i≦4, p is a null-intolerant predicate,sch(p)∩R₃ ≠.o slashed., → is a left outer join operator, ⃡ is a fullouter join operator, ∩_(a) is an intersect all operator, ∩_(d) is anintersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 42. The apparatus asset forth in claim 32 above, wherein the means for iteratively replacingcomprises means for replacing a first expression in the query with asecond expression, wherein the first expression comprises: ##EQU83## andthe second expression comprises: ##EQU84## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₂ ≠.o slashed., sch(p)∩R₃ ≠.o slashed., is a joinoperator, → is a left outer join operator, ⃡ is a full outer joinoperator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a), ∩_(d) }.
 43. The apparatus as set forthin claim 32 above, wherein the means for iteratively replacing comprisesmeans for replacing a first expression in the query with a secondexpression, wherein the first expression comprises: ##EQU85## and thesecond expression comprises: ##EQU86## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p* is a join reducedpredicate in an attribute in R₃, is a join operator, → is a left outerjoin operator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 44. The apparatus as set forthin claim 32 above, wherein the means for iteratively replacing comprisesmeans for replacing a first expression in the query with a secondexpression, wherein the first expression comprises: ##EQU87## and thesecond expression comprises: ##EQU88## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p* is a join reducedpredicate in an attribute in R₂, → is a left outer join operator, ⃡ is afull outer join operator, ∩_(a) is an intersect all operator, ∩_(d) isan intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 45. The apparatusas set forth in claim 32 above, wherein the means for iterativelyreplacing comprises means for replacing a first expression in the querywith a second expression, wherein the first expression comprises:##EQU89## and the second expression comprises: ##EQU90## wherein X_(i)=(R_(i), V_(i), E_(i)) are relational expressions for 1≦i≦4, p* is ajoin reduced predicate in an attribute in R₂ and R₃, is a join operator,⃡ is a full outer join operator, ∩_(a) is an intersect all operator,∩_(d) is an intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 46. Anapparatus for simplifying an SQL query, comprising:(a) a computer havinga memory and an electronic storage device coupled thereto, the datastorage device storing a relational database; (b) means, performed bythe computer, for accepting the SQL query into the memory of thecomputer, the SQL query being performed by the computer to retrieve datafrom a relational database stored in the computer; (c) means, performedby the computer, for examining the query to identify instances where aselection operator is applied to a result of a sub-expression, whereinthe sub-expression comprises a first operator applied to first andsecond operand tables; (d) means, performed by the computer, foridentifying classes of attributes for the selection operator and thesub-expression that are equated through the first operator in thesub-expression; (e) means, performed by the computer, for generatingselection predicates from the identified classes; and (f) means,performed by the computer, for applying selection operators to the firstand second operand tables using the generated selection predicatesbefore the first operator is performed, thereby making intermediateresults from the first operator smaller.
 47. The apparatus as set forthin claim 32 above, further comprising:(1) means for iterativelycalculating a strong set of attributes referenced in a sub-expression ofthe query; and (2) means for iteratively replacing full, left, and rightouter join operations in the sub-expression with outer and inner joinoperations using the strong set of attributes for the sub-expressionuntil no more simplification is possible.
 48. The apparatus of claim 46above, wherein the first operator is selected from a group comprising:(1) a join operator, (2) a full outer join operator, (3) a one-sidedouter join operator, and (4) an intersect operator.
 49. The apparatus ofclaim 46 above, wherein the means for applying further comprises meansfor replacing: ##EQU91## when p_(X) J_(XY) p_(Y), wherein X=(R_(X),V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) are relational expressions,p_(X) is a selection predicate on X, p_(Y) is a selection predicate onY, J_(XY) is a null-intolerant join/outer join predicate, is a joinoperator, and σ is a selection operator.
 50. The apparatus of claim 46above, wherein the means for applying further comprises means forreplacing: ##EQU92## when p_(X) J_(XY) p_(Y), wherein X=(R_(X), X_(X),E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) are relational expressions, p_(X) isa selection predicate on X, p_(Y) is a selection predicate on Y, J_(XY)is a null-intolerant join/outer join predicate, → is a left outer joinoperator, and σ is a selection operator.
 51. The apparatus of claim 46above, wherein the means for applying further comprises means forreplacing: ##EQU93## when p_(X) J_(XY) p_(Y), wherein X=(R_(X), V_(X),E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) are relational expressions, p_(X) isa selection predicate on X, p_(Y) is a selection predicate on Y, J_(XY)is a null-intolerant join/outer join predicate, → is a left outer joinoperator, and σ is a selection operator.
 52. The apparatus of claim 46above, wherein the means for applying further comprises means forreplacing: ##EQU94## when p_(Y) J_(XY) p_(X), wherein X=(R_(X), V_(X),E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) are relational expressions, p_(X) isa selection predicate on X, p_(Y) is a selection predicate on Y, J_(XY)is a null-intolerant join/outer join predicate, is a join operator, andσ is a selection operator.
 53. The apparatus of claim 46 above, whereinthe means for applying further comprises means for replacing:

    σ.sub.p (X∩Y)

with:

    σ.sub.p.sbsb.X (X)∩σ.sub.p.sbsb.Y (Y)

wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) arerelational expressions, p is a predicate on output attributes of anintersection operation ∩ε(∩_(d),∩_(a)), ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p_(X) is a selectionpredicate on operand X, p_(Y) is a selection predicate on operand Y,p_(X) and p_(Y) are obtained from p by renaming attributes referenced inp to corresponding attributes in the operands X and Y respectively, is ajoin operator, and σ is a selection operator.
 54. The apparatus of claim46 above, wherein the means for applying further comprises means forreplacing:

    σ.sub.p.sbsb.X (X)∩Y

with:

    σ.sub.p (X∩Y)

wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) arerelational expressions, p is a predicate on output attributes of anintersection operation ∩ε(∩_(d),∩_(a))∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p_(X) is a selectionpredicate on operand X, p_(Y) is a selection predicate on operand Y,p_(X) and p_(Y) are obtained from p by renaming attributes referenced inp to corresponding attributes in the operands X and Y, respectively, andσ is a selection operator.
 55. The apparatus of claim 46 above, whereinthe means for applying further comprises means for replacing: ##EQU95##wherein X=(R_(X), X_(X), E_(X)), Y=(R_(Y), V_(Y), E_(Y)), and Z=(R_(Z),V_(Z), E_(Z)) are relational expressions, p is a predicate on outputattributes of an intersection operation ∩ε(∩_(d),∩_(a)), ∩_(a) is anintersect all operator, ∩_(d) is an intersect distinct operator, p isobtained by replacing each attribute in J_(XY) by their correspondingunion-compatible attributes from Z, is a join operator, and σ_(p) is aselection operator.
 56. A program storage device readable by a computer,tangibly embodying a program of instructions executable by the computerto perform method steps for simplifying an SQL query in a computerhaving a memory, the SQL query being performed by the computer toretrieve data from a relational database stored in an electronic storagedevice coupled to the computer, the method comprising the steps of:(a)iteratively calculating, in the memory of the computer, a strong set ofattributes referenced in the query; and (b) iteratively replacing, inthe memory of the computer, full, left, and right outer join operationsfound in one or more operands of an intersection operation in the querywith outer and inner join operations using the strong set of attributesuntil no more simplification is possible.
 57. The method as set forth inclaim 56 above, wherein an attribute A is a member of the strong set ofattributes when all tuples defined on schemas containing A have non-nullvalues for attribute A.
 58. The method as set forth in claim 56 above,wherein an attribute A is a member of the strong set of attributes whenthe attribute A is referenced through null-intolerant selection and joinpredicates.
 59. The method as set forth in claim 56 above, wherein anattribute A is a member of the strong set of attributes when all tuplesdefined on schemas containing A have exactly the same non-null value forattribute A.
 60. The method as set forth in claim 56 above, wherein anattribute A is a member of the strong set of attributes when a selectionoperator referencing the attribute A assigns a constant value to theattribute A.
 61. The method as set forth in claim 56 above, wherein theiteratively replacing step comprises the step of replacing a firstexpression in the query with a second expression, wherein the firstexpression comprises: ##EQU96## and the second expression comprises:##EQU97## wherein X_(i) =(R_(i), V_(i), E_(i)) are relationalexpressions for 1≦i≦4, p is a null-intolerant predicate, sch(p)∩R₃ ≠.oslashed., is a join operator, → is a left outer join operator, ∩_(a) isan intersect all operator, ∩_(d) is an intersect distinct operator, and∩ε{∩_(a),∩_(d) }.
 62. The method as set forth in claim 56 above, whereinthe iteratively replacing step comprises the step of replacing a firstexpression in the query with a second expression, wherein the firstexpression comprises: ##EQU98## and the second expression comprises:##EQU99## wherein X_(i) =(R_(i), V_(i), E_(i)) are relationalexpressions for 1≦i≦4, p is a null-intolerant predicate, sch(p)∩R₂ ≠.oslashed., → is a left outer join operator, ⃡ is a full outer joinoperator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 63. The method as set forth inclaim 56 above, wherein the iteratively replacing step comprises thestep of replacing a first expression in the query with a secondexpression, wherein the first expression comprises: ##EQU100## and thesecond expression comprises: ##EQU101## wherein X_(i) =(R_(i), V_(i),E_(i)) are relational expressions for 1≦i≦4, p is a null-intolerantpredicate, sch(p)∩R₂ ≠.o slashed., sch(p)∩R₃ ≠.o slashed., is a joinoperator, ⃡ is a full outer join operator, ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.64. The method as set forth in claim 56 above, wherein the iterativelyreplacing step comprises the step of replacing a first expression in thequery with a second expression, wherein the first expression comprises:##EQU102## and the second expression comprises: ##EQU103## wherein X_(i)=(R_(i), V_(i), E_(i)) are relational expressions for 1≦i≦4, p is anull-intolerant predicate, sch(p)∩R₃ ≠.o slashed., is a join operator, →is a left outer join operator, ∩_(a) is an intersect all operator, ∩_(d)is an intersect distinct operator, and ∩ε{∩_(a),∩_(d) }.
 65. The methodas set forth in claim 56 above, wherein the iteratively replacing stepcomprises the step of replacing a first expression in the query with asecond expression, wherein the first expression comprises: ##EQU104##and the second expression comprises: ##EQU105## wherein X_(i) =(R_(i),V_(i), E_(i)) are relational expressions for 1≦i≦4, p is anull-intolerant predicate, sch(p)∩R₃ ≠.o slashed., → is a left outerjoin operator, ⃡ is a full outer join operator, ∩_(a) is an intersectall operator, ∩_(d) is an intersect distinct operator, and∩ε{∩_(a),∩_(d) }.
 66. The method as set forth in claim 56 above, whereinthe iteratively replacing step comprises the step of replacing a firstexpression in the query with a second expression, wherein the firstexpression comprises: ##EQU106## and the second expression comprises:##EQU107## wherein X_(i) =(R_(i), V_(i), E_(i)) are relationalexpressions for 1≦i≦4, p is a null-intolerant predicate, sch(p)∩R₂ ≠.oslashed., sch(p)∩R₃ ≠.o slashed., is a join operator, → is a left outerjoin operator, ⃡ is a full outer join operator, ∩_(a) is an intersectall operator, ∩_(d) is an intersect distinct operator, and∩ε{∩_(a),∩_(d) }.
 67. The method as set forth in claim 56 above, whereinthe iteratively replacing step comprises the step of replacing a firstexpression in the query with a second expression, wherein the firstexpression comprises: ##EQU108## and the second expression comprises:##EQU109## wherein X_(i) =(R_(i), V_(i), E_(i)) are relationalexpressions for 1≦i≦4, p* is a join reduced predicate in an attribute inR₃, is a join operator, → is a left outer join operator, ∩_(a) is anintersect all operator, ∩_(d) is an intersect distinct operator, and∩ε{∩_(a),∩_(d) }.
 68. The method as set forth in claim 56 above, whereinthe iteratively replacing step comprises the step of replacing a firstexpression in the query with a second expression, wherein the firstexpression comprises: ##EQU110## and the second expression comprises:##EQU111## wherein X_(i) =(R_(i), V_(i), E_(i)) are relationalexpressions for 1≦i≦4, p* is a join reduced predicate in an attribute inR₂, → is a left outer join operator, ⃡ is a full outer join operator,∩_(a) is an intersect all operator, ∩_(d) is an intersect distinctoperator, and ∩ε{∩_(a),∩_(d) }.
 69. The method as set forth in claim 56above, wherein the iteratively replacing step comprises the step ofreplacing a first expression in the query with a second expression,wherein the first expression comprises: ##EQU112## and the secondexpression comprises: ##EQU113## wherein X_(i) =(R_(i), V_(i), E_(i))are relational expressions for 1≦i≦4, p* is a join reduced predicate inan attribute in R₂ and R₃, is a join operator, ⃡ is a full outer joinoperator, ∩_(a) is an intersect all operator, ∩_(d) is an intersectdistinct operator, and ∩ε{∩_(a),∩_(d) }.
 70. The method as set forth inclaim 56 above, further comprising the steps of:(1) iterativelycalculating a strong set of attributes referenced in a sub-expression ofthe query; and (2) iteratively replacing full, left, and right outerjoin operations in the sub-expression with outer and inner joinoperations using the strong set of attributes for the sub-expressionuntil no more simplification is possible.
 71. A program storage devicereadable by a computer, tangibly embodying a program of instructionsexecutable by the computer to perform method steps for simplifying anSQL query in a computer having a memory, the SQL query being performedby the computer to retrieve data from a relational database stored in anelectronic storage device coupled to the computer, the method comprisingthe steps of:(a) generating an expression tree to represent the query inthe memory of the computer, the expression tree comprised of a set ofleaves corresponding to relations referenced in the query, and a set ofinner nodes corresponding to operators occurring in the query; (b)generating a strong set of attributes for the expression tree in thememory of the computer; (c) traversing the expression tree in the memoryof the computer until a node representing an intersection operator isencountered; (d) simplifying a subtree of the expression tree below theintersection operator in the memory of the computer; (e) re-generatingthe strong set of attributes for the subtree in the memory of thecomputer; and (f) repeating steps (b)-(e) until the strong set ofattributes does not change.
 72. The method as set forth in claim 71above, wherein the strong set of attributes is comprised of one or moreattributes A, such that all tuples defined on schemas containing A havenon-null values for attribute A.
 73. The method as set forth in claim 71above, wherein the strong set of attributes is comprised of one or moreattributes A, wherein the attribute A is referenced throughnull-intolerant selection and join predicates.
 74. The method as setforth in claim 71 above, wherein the strong set of attributes iscomprised of one or more attributes A, such that all tuples defined onschemas containing A have exactly the same non-null value for attributeA.
 75. The method as set forth in claim 71 above, wherein the strong setof attributes is comprised of one or more attributes A, wherein aselection operator referencing the attribute A assigns a constant valueto the attribute A.
 76. The method as set forth in claim 71 above,wherein the simplifying step comprises replacing an outer join operationwith a join operation.
 77. The method as set forth in claim 71 above,wherein the simplifying step comprises replacing a full outer joinoperation with a one-sided outer join operation.
 78. A program storagedevice readable by a computer, tangibly embodying a program ofinstructions executable by the computer to perform method steps forsimplifying an SQL query in a computer having a memory, the SQL querybeing performed by the computer to retrieve data from a relationaldatabase stored in an electronic storage device coupled to the computer,the method comprising the steps of:(a) examining query in the memory ofthe computer to identify instances where a selection operator is appliedto a result of a sub-expression, wherein the sub-expression comprises afirst operator applied to first and second operand tables; (b)identifying, in the memory of the computer, classes of attributes forthe selection operator and the sub-expression that are equated throughthe first operator in the sub-expression; (c) generating, in the memoryof the computer, selection predicates from the identified classes; and(d) applying, in the memory of the computer, selection operators to thefirst and second operand tables using the generated selection predicatesbefore the first operator is performed, thereby making intermediateresults from the first operator smaller.
 79. The method of claim 78above, wherein the first operator is selected from a group comprising:(1) a join operator, (2) a full outer join operator, (3) a one-sidedouter join operator, and (4) an intersect operator.
 80. The method ofclaim 78 above, wherein the applying step further comprises the step ofreplacing: ##EQU114## when p_(X) J_(XY) p_(Y), wherein X=(R_(X), V_(X),E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) are relational expressions, p_(X) isa selection predicate on X, p_(Y) is a selection predicate on Y, J_(XY)is a null-intolerant join/outer join predicate, is a join operator, andσ is a selection operator.
 81. The method of claim 78 above, wherein theapplying step further comprises the step of replacing: ##EQU115## whenp_(X) J_(XY) p_(Y), wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y),E_(Y)) are relational expressions, p_(X) is a selection predicate on X,p_(Y) is a selection predicate on Y, J_(XY) is a null-intolerantjoin/outer join predicate, → is a left outer join operator, and σ is aselection operator.
 82. The method of claim 78 above, wherein theapplying step further comprises the step of replacing: ##EQU116## whenp_(X) J_(XY) p_(Y), wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y),E_(Y)) are relational expressions, p_(X) is a selection predicate on X,p_(Y) is a selection predicate on Y, J_(XY) is a null-intolerantjoin/outer join predicate, → is a left outer join operator, and σ is aselection operator.
 83. The method of claim 78 above, wherein theapplying step further comprises the step of replacing: ##EQU117## whenp_(Y) J_(XY) p_(X), wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y),E_(Y)) are relational expressions, p_(X) is a selection predicate on X,p_(Y) is a selection predicate on Y, J_(XY) is a null-intolerantjoin/outer join predicate, is a join operator, and σ is a selectionoperator.
 84. The method of claim 78 above, wherein the applying stepfurther comprises the step of replacing:

    σ.sub.p (X∩Y)

with:

    σ.sub.p.sbsb.X (X)∩σ.sub.p.sbsb.Y (Y)

wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) arerelational expressions, p is a predicate on output attributes of anintersection operation ∩ε(∩_(d), ∩_(a)), ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p_(X) is a selectionpredicate on operand X, p_(Y) is a selection predicate on operand Y,p_(X) and p_(Y) are obtained from p by renaming attributes referenced inp to corresponding attributes in the operands X and Y, respectively, isa join operator, and σ is a selection operator.
 85. The method of claim78 above, wherein the applying step further comprises the step ofreplacing:

    σ.sub.p.sbsb.X (X)∩Y

with:

    σ.sub.p (∩Y)

wherein X=(R_(X), V_(X), E_(X)) and Y=(R_(Y), V_(Y), E_(Y)) arerelational expressions, p is a predicate on output attributes of anintersection operation ∩ε(∩_(d),∩_(a)), ∩_(a) is an intersect alloperator, ∩_(d) is an intersect distinct operator, p_(X) is a selectionpredicate on operand X, p_(Y) is a selection predicate on operand Y,p_(X) and p_(Y) are obtained from p by renaming attributes referenced inp to corresponding attributes in the operands X and Y, respectively, andσ is a selection operator.
 86. The method of claim 78 above, wherein theapplying step further comprises the step of replacing: ##EQU118##wherein X=(R_(X), X_(X), E_(X)), Y=(R_(Y), V_(Y), E_(Y), and Z=(R_(Z),V_(Z), E_(Z)) are relational expressions, p is a predicate on outputattributes of an intersection operation ∩ε(∩_(d),∩_(a)), ∩_(a) is anintersect all operator, ∩_(d) is an intersect distinct operator, p isobtained by replacing each attribute in J_(XY) by their correspondingunion-compatible attributes from Z, is a join operator and σ_(p) is aselection operator.